Web Scraping: How to get info from dynamic pages?












0














I'm newbie in web scraping. I know how to get data from an HTML or from a JSON but there is a place where I can't know how to do it. I would like to get the positions of points and X's that you can see in the short chart of this page.



http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart



How can I do that?










share|improve this question





























    0














    I'm newbie in web scraping. I know how to get data from an HTML or from a JSON but there is a place where I can't know how to do it. I would like to get the positions of points and X's that you can see in the short chart of this page.



    http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart



    How can I do that?










    share|improve this question



























      0












      0








      0


      1





      I'm newbie in web scraping. I know how to get data from an HTML or from a JSON but there is a place where I can't know how to do it. I would like to get the positions of points and X's that you can see in the short chart of this page.



      http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart



      How can I do that?










      share|improve this question















      I'm newbie in web scraping. I know how to get data from an HTML or from a JSON but there is a place where I can't know how to do it. I would like to get the positions of points and X's that you can see in the short chart of this page.



      http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart



      How can I do that?







      html json python-3.x web-scraping beautifulsoup






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 23 '18 at 14:06

























      asked Nov 23 '18 at 2:19









      José Carlos

      68321944




      68321944
























          1 Answer
          1






          active

          oldest

          votes


















          3














          I'm fairly new as well, but learning as I go. It looks like this page is dynamic, so you'd need to use Selenium to load the page first, before grabbing the html with beautifulsoup to get the x and y coordinates from the Made Shots and Missed shots. So I gave it a shot and was able to get a dataframe with the x, y coords along with if it was 'made' or 'miss'.



          I plotted it afterwards just to check to see if it matched, and it appears to be flipped about the x-axis. I believe this is because when you plot on a chart like this graphically, the top, left corner is your (0,0). So your y coordinates are going to be opposite when you want to plot it. I could be wrong though.



          None the less, here's the code I used.



          import pandas as pd
          import bs4
          from selenium import webdriver

          driver = webdriver.Chrome('C:chromedriver_win32chromedriver.exe')
          driver.get('http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart')

          html = driver.page_source
          soup = bs4.BeautifulSoup(html,'html.parser')

          made_shots = soup.findAll("svg", {"class": "shot-hit icon icon-point clickable"})
          missed_shots = soup.findAll("svg", {"class": "shot-miss icon icon-miss clickable"})

          def get_coordiantes(element, label):
          results = pd.DataFrame()
          for point in element:
          x_point = float(point.get('x'))
          y_point = float(point.get('y'))
          marker = label
          temp_df = pd.DataFrame([[x_point, y_point, marker]], columns=['x','y','marker'])
          results = results.append(temp_df)
          return results

          made_results = get_coordiantes(made_shots, 'made')
          missed_results = get_coordiantes(missed_shots, 'missed')

          results = made_results.append(missed_results)
          results = results.reset_index(drop=True)

          results['y'] = results['y'] * -1

          driver.close()


          gives this output:



          In [6]:results.head(5)
          Out[6]:
          x y marker
          0 33.0 -107.0 made
          1 159.0 -160.0 made
          2 143.0 -197.0 made
          3 38.0 -113.0 made
          4 65.0 -130.0 made


          and when I plot it:



          import seaborn as sns
          import numpy as np

          # Add a column: the color depends of x and y values, but you can use whatever function.
          value=(results['marker'] == 'made')
          results['color']= np.where( value==True , "green", "red")

          # plot
          sns.regplot(data=results, x="x", y="y", fit_reg=False, scatter_kws={'facecolors':results['color']})


          enter image description here



          ADDITIONAL: I'm sure there's a better, more efficient, cleaner way to code this up. But just doing it on the fly, came up with this. It should get you going. Feel free to dive into it and look at the html source code to start seeing how it's grabbing the different data. have fun.



          import pandas as pd
          import bs4
          from selenium import webdriver

          driver = webdriver.Chrome('C:chromedriver_win32chromedriver.exe')
          driver.get('http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart')

          html = driver.page_source
          soup = bs4.BeautifulSoup(html,'html.parser')


          ###############################################################################

          shots = soup.findAll("g", {"class": "shot-item"})

          results = pd.DataFrame()
          for point in shots:
          hit = point.get('data-play-by-play-action-hit')
          action_id = point.get('data-play-by-play-action-id')
          period = point.get('data-play-by-play-action-period')
          player_id = point.get('data-play-by-play-action-player-id')
          team_id = point.get('data-play-by-play-action-team-id')

          x_point = float(point.find('svg').get('x'))
          y_point = float(point.find('svg').get('y'))

          temp_df = pd.DataFrame([[hit, action_id, period, player_id, team_id, x_point, y_point]],
          columns=['hit','action_id','period','player_id','team_id','x','y'])
          results = results.append(temp_df)

          results['y'] = results['y'] * -1
          results = results.reset_index(drop=True)



          ###############################################################################

          player_ids = soup.findAll('label', {"class": "item-label"})

          players = pd.DataFrame()
          for player in player_ids:
          player_id = player.find('input').get('data-play-by-play-action-player-id')
          if player_id == None:
          continue

          player_name = player.find('span').text

          temp_df = pd.DataFrame([[player_id, player_name]],
          columns=['player_id','player_name'])

          players = players.append(temp_df)

          players = players.reset_index(drop=True)

          ###############################################################################

          team_ids = soup.findAll('div', {"class": "header-scores_desktop"})
          teams_A = team_ids[0].find('div', {"class": "team-A"})
          team_id_A = teams_A.find('img').get('src').rsplit('/')[-1]
          team_name_A = teams_A.find('span').text
          teams_B = team_ids[0].find('div', {"class": "team-B"})
          team_id_B = teams_B.find('img').get('src').rsplit('/')[-1]
          team_name_B = teams_B.find('span').text

          teams = pd.DataFrame([[team_id_A, team_name_A],[team_id_B,team_name_B]],
          columns=['team_id','team_name'])

          teams = teams.reset_index(drop=True)

          ###############################################################################

          actions = pd.DataFrame()

          action_ids = soup.findAll('div', {"class": "overlay-wrapper"})

          for action in action_ids:
          action_id = action.get('data-play-by-play-action-id')
          time_remaining = action.find('div').find('span', {'class': 'time'}).text
          full_name = action.find('div').find('span', {'class': 'athlete-name'}).text

          if not action.find('div').find('span', {'class': 'action-code'}):
          result_of_action = '+0'
          else:
          result_of_action = action.find('div').find('span', {'class': 'action-code'}).text

          action_description = action.find('div').find('span', {'class': 'action-description'}).text

          team_A_score = action.find('div').find('span', {'class': 'team-A'}).text
          team_B_score = action.find('div').find('span', {'class': 'team-B'}).text


          temp_df = pd.DataFrame([[action_id, time_remaining, full_name, result_of_action, team_A_score, team_B_score, action_description]],
          columns=['action_id','time_remaining', 'full_name', 'result_of_action', team_name_A+'_score', team_name_B+' score', 'action-description'])

          actions = actions.append(temp_df)


          actions = actions.reset_index(drop=True)


          ###############################################################################

          results = pd.merge(results, players, how='left', on='player_id')
          results = pd.merge(results, teams, how='left', on='team_id')
          results = pd.merge(results, actions, how='left', on='action_id')

          driver.close()


          And to clean it a bit, you can sort the rows so that they are in order, play-by-play from start to finish



          results.sort_values(['period', 'time_remaining'], ascending=[True, False], inplace=True)
          results = results.reset_index(drop=True)





          share|improve this answer























          • OMG!!! What a wonderful answer and job!!! Thank you so much!!! Do you know if it's possible to know the player who made the shot and the time when is made it? Thank you, thank you, thank you!!!
            – José Carlos
            Nov 23 '18 at 13:35










          • I've got a question ... How can you find that these "shot-hit icon icon-point clickable" are the classes to seek? After "soup = bs4.BeautifulSoup(html,'html.parser')" have you print the code and search in them?
            – José Carlos
            Nov 23 '18 at 13:40






          • 1




            you could do that just search through the print (soup), but it's messy. Sometimes I'll just paste it to notepad++ and look. But it's easier I think to right click on the site, and 'Inspect' and click around in there to see how it's structured and what tags they use. There's a bunch of video tutorials out there. I'll admit, it's confusing at first, but practice with it makes it a bit easier...I'm still learning. It might be possible to also grab the player name and time. I'll look through it now.
            – chitown88
            Nov 23 '18 at 13:43












          • Thank you @chitown88 for the help added!!!
            – José Carlos
            Nov 23 '18 at 19:01











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53439922%2fweb-scraping-how-to-get-info-from-dynamic-pages%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          I'm fairly new as well, but learning as I go. It looks like this page is dynamic, so you'd need to use Selenium to load the page first, before grabbing the html with beautifulsoup to get the x and y coordinates from the Made Shots and Missed shots. So I gave it a shot and was able to get a dataframe with the x, y coords along with if it was 'made' or 'miss'.



          I plotted it afterwards just to check to see if it matched, and it appears to be flipped about the x-axis. I believe this is because when you plot on a chart like this graphically, the top, left corner is your (0,0). So your y coordinates are going to be opposite when you want to plot it. I could be wrong though.



          None the less, here's the code I used.



          import pandas as pd
          import bs4
          from selenium import webdriver

          driver = webdriver.Chrome('C:chromedriver_win32chromedriver.exe')
          driver.get('http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart')

          html = driver.page_source
          soup = bs4.BeautifulSoup(html,'html.parser')

          made_shots = soup.findAll("svg", {"class": "shot-hit icon icon-point clickable"})
          missed_shots = soup.findAll("svg", {"class": "shot-miss icon icon-miss clickable"})

          def get_coordiantes(element, label):
          results = pd.DataFrame()
          for point in element:
          x_point = float(point.get('x'))
          y_point = float(point.get('y'))
          marker = label
          temp_df = pd.DataFrame([[x_point, y_point, marker]], columns=['x','y','marker'])
          results = results.append(temp_df)
          return results

          made_results = get_coordiantes(made_shots, 'made')
          missed_results = get_coordiantes(missed_shots, 'missed')

          results = made_results.append(missed_results)
          results = results.reset_index(drop=True)

          results['y'] = results['y'] * -1

          driver.close()


          gives this output:



          In [6]:results.head(5)
          Out[6]:
          x y marker
          0 33.0 -107.0 made
          1 159.0 -160.0 made
          2 143.0 -197.0 made
          3 38.0 -113.0 made
          4 65.0 -130.0 made


          and when I plot it:



          import seaborn as sns
          import numpy as np

          # Add a column: the color depends of x and y values, but you can use whatever function.
          value=(results['marker'] == 'made')
          results['color']= np.where( value==True , "green", "red")

          # plot
          sns.regplot(data=results, x="x", y="y", fit_reg=False, scatter_kws={'facecolors':results['color']})


          enter image description here



          ADDITIONAL: I'm sure there's a better, more efficient, cleaner way to code this up. But just doing it on the fly, came up with this. It should get you going. Feel free to dive into it and look at the html source code to start seeing how it's grabbing the different data. have fun.



          import pandas as pd
          import bs4
          from selenium import webdriver

          driver = webdriver.Chrome('C:chromedriver_win32chromedriver.exe')
          driver.get('http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart')

          html = driver.page_source
          soup = bs4.BeautifulSoup(html,'html.parser')


          ###############################################################################

          shots = soup.findAll("g", {"class": "shot-item"})

          results = pd.DataFrame()
          for point in shots:
          hit = point.get('data-play-by-play-action-hit')
          action_id = point.get('data-play-by-play-action-id')
          period = point.get('data-play-by-play-action-period')
          player_id = point.get('data-play-by-play-action-player-id')
          team_id = point.get('data-play-by-play-action-team-id')

          x_point = float(point.find('svg').get('x'))
          y_point = float(point.find('svg').get('y'))

          temp_df = pd.DataFrame([[hit, action_id, period, player_id, team_id, x_point, y_point]],
          columns=['hit','action_id','period','player_id','team_id','x','y'])
          results = results.append(temp_df)

          results['y'] = results['y'] * -1
          results = results.reset_index(drop=True)



          ###############################################################################

          player_ids = soup.findAll('label', {"class": "item-label"})

          players = pd.DataFrame()
          for player in player_ids:
          player_id = player.find('input').get('data-play-by-play-action-player-id')
          if player_id == None:
          continue

          player_name = player.find('span').text

          temp_df = pd.DataFrame([[player_id, player_name]],
          columns=['player_id','player_name'])

          players = players.append(temp_df)

          players = players.reset_index(drop=True)

          ###############################################################################

          team_ids = soup.findAll('div', {"class": "header-scores_desktop"})
          teams_A = team_ids[0].find('div', {"class": "team-A"})
          team_id_A = teams_A.find('img').get('src').rsplit('/')[-1]
          team_name_A = teams_A.find('span').text
          teams_B = team_ids[0].find('div', {"class": "team-B"})
          team_id_B = teams_B.find('img').get('src').rsplit('/')[-1]
          team_name_B = teams_B.find('span').text

          teams = pd.DataFrame([[team_id_A, team_name_A],[team_id_B,team_name_B]],
          columns=['team_id','team_name'])

          teams = teams.reset_index(drop=True)

          ###############################################################################

          actions = pd.DataFrame()

          action_ids = soup.findAll('div', {"class": "overlay-wrapper"})

          for action in action_ids:
          action_id = action.get('data-play-by-play-action-id')
          time_remaining = action.find('div').find('span', {'class': 'time'}).text
          full_name = action.find('div').find('span', {'class': 'athlete-name'}).text

          if not action.find('div').find('span', {'class': 'action-code'}):
          result_of_action = '+0'
          else:
          result_of_action = action.find('div').find('span', {'class': 'action-code'}).text

          action_description = action.find('div').find('span', {'class': 'action-description'}).text

          team_A_score = action.find('div').find('span', {'class': 'team-A'}).text
          team_B_score = action.find('div').find('span', {'class': 'team-B'}).text


          temp_df = pd.DataFrame([[action_id, time_remaining, full_name, result_of_action, team_A_score, team_B_score, action_description]],
          columns=['action_id','time_remaining', 'full_name', 'result_of_action', team_name_A+'_score', team_name_B+' score', 'action-description'])

          actions = actions.append(temp_df)


          actions = actions.reset_index(drop=True)


          ###############################################################################

          results = pd.merge(results, players, how='left', on='player_id')
          results = pd.merge(results, teams, how='left', on='team_id')
          results = pd.merge(results, actions, how='left', on='action_id')

          driver.close()


          And to clean it a bit, you can sort the rows so that they are in order, play-by-play from start to finish



          results.sort_values(['period', 'time_remaining'], ascending=[True, False], inplace=True)
          results = results.reset_index(drop=True)





          share|improve this answer























          • OMG!!! What a wonderful answer and job!!! Thank you so much!!! Do you know if it's possible to know the player who made the shot and the time when is made it? Thank you, thank you, thank you!!!
            – José Carlos
            Nov 23 '18 at 13:35










          • I've got a question ... How can you find that these "shot-hit icon icon-point clickable" are the classes to seek? After "soup = bs4.BeautifulSoup(html,'html.parser')" have you print the code and search in them?
            – José Carlos
            Nov 23 '18 at 13:40






          • 1




            you could do that just search through the print (soup), but it's messy. Sometimes I'll just paste it to notepad++ and look. But it's easier I think to right click on the site, and 'Inspect' and click around in there to see how it's structured and what tags they use. There's a bunch of video tutorials out there. I'll admit, it's confusing at first, but practice with it makes it a bit easier...I'm still learning. It might be possible to also grab the player name and time. I'll look through it now.
            – chitown88
            Nov 23 '18 at 13:43












          • Thank you @chitown88 for the help added!!!
            – José Carlos
            Nov 23 '18 at 19:01
















          3














          I'm fairly new as well, but learning as I go. It looks like this page is dynamic, so you'd need to use Selenium to load the page first, before grabbing the html with beautifulsoup to get the x and y coordinates from the Made Shots and Missed shots. So I gave it a shot and was able to get a dataframe with the x, y coords along with if it was 'made' or 'miss'.



          I plotted it afterwards just to check to see if it matched, and it appears to be flipped about the x-axis. I believe this is because when you plot on a chart like this graphically, the top, left corner is your (0,0). So your y coordinates are going to be opposite when you want to plot it. I could be wrong though.



          None the less, here's the code I used.



          import pandas as pd
          import bs4
          from selenium import webdriver

          driver = webdriver.Chrome('C:chromedriver_win32chromedriver.exe')
          driver.get('http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart')

          html = driver.page_source
          soup = bs4.BeautifulSoup(html,'html.parser')

          made_shots = soup.findAll("svg", {"class": "shot-hit icon icon-point clickable"})
          missed_shots = soup.findAll("svg", {"class": "shot-miss icon icon-miss clickable"})

          def get_coordiantes(element, label):
          results = pd.DataFrame()
          for point in element:
          x_point = float(point.get('x'))
          y_point = float(point.get('y'))
          marker = label
          temp_df = pd.DataFrame([[x_point, y_point, marker]], columns=['x','y','marker'])
          results = results.append(temp_df)
          return results

          made_results = get_coordiantes(made_shots, 'made')
          missed_results = get_coordiantes(missed_shots, 'missed')

          results = made_results.append(missed_results)
          results = results.reset_index(drop=True)

          results['y'] = results['y'] * -1

          driver.close()


          gives this output:



          In [6]:results.head(5)
          Out[6]:
          x y marker
          0 33.0 -107.0 made
          1 159.0 -160.0 made
          2 143.0 -197.0 made
          3 38.0 -113.0 made
          4 65.0 -130.0 made


          and when I plot it:



          import seaborn as sns
          import numpy as np

          # Add a column: the color depends of x and y values, but you can use whatever function.
          value=(results['marker'] == 'made')
          results['color']= np.where( value==True , "green", "red")

          # plot
          sns.regplot(data=results, x="x", y="y", fit_reg=False, scatter_kws={'facecolors':results['color']})


          enter image description here



          ADDITIONAL: I'm sure there's a better, more efficient, cleaner way to code this up. But just doing it on the fly, came up with this. It should get you going. Feel free to dive into it and look at the html source code to start seeing how it's grabbing the different data. have fun.



          import pandas as pd
          import bs4
          from selenium import webdriver

          driver = webdriver.Chrome('C:chromedriver_win32chromedriver.exe')
          driver.get('http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart')

          html = driver.page_source
          soup = bs4.BeautifulSoup(html,'html.parser')


          ###############################################################################

          shots = soup.findAll("g", {"class": "shot-item"})

          results = pd.DataFrame()
          for point in shots:
          hit = point.get('data-play-by-play-action-hit')
          action_id = point.get('data-play-by-play-action-id')
          period = point.get('data-play-by-play-action-period')
          player_id = point.get('data-play-by-play-action-player-id')
          team_id = point.get('data-play-by-play-action-team-id')

          x_point = float(point.find('svg').get('x'))
          y_point = float(point.find('svg').get('y'))

          temp_df = pd.DataFrame([[hit, action_id, period, player_id, team_id, x_point, y_point]],
          columns=['hit','action_id','period','player_id','team_id','x','y'])
          results = results.append(temp_df)

          results['y'] = results['y'] * -1
          results = results.reset_index(drop=True)



          ###############################################################################

          player_ids = soup.findAll('label', {"class": "item-label"})

          players = pd.DataFrame()
          for player in player_ids:
          player_id = player.find('input').get('data-play-by-play-action-player-id')
          if player_id == None:
          continue

          player_name = player.find('span').text

          temp_df = pd.DataFrame([[player_id, player_name]],
          columns=['player_id','player_name'])

          players = players.append(temp_df)

          players = players.reset_index(drop=True)

          ###############################################################################

          team_ids = soup.findAll('div', {"class": "header-scores_desktop"})
          teams_A = team_ids[0].find('div', {"class": "team-A"})
          team_id_A = teams_A.find('img').get('src').rsplit('/')[-1]
          team_name_A = teams_A.find('span').text
          teams_B = team_ids[0].find('div', {"class": "team-B"})
          team_id_B = teams_B.find('img').get('src').rsplit('/')[-1]
          team_name_B = teams_B.find('span').text

          teams = pd.DataFrame([[team_id_A, team_name_A],[team_id_B,team_name_B]],
          columns=['team_id','team_name'])

          teams = teams.reset_index(drop=True)

          ###############################################################################

          actions = pd.DataFrame()

          action_ids = soup.findAll('div', {"class": "overlay-wrapper"})

          for action in action_ids:
          action_id = action.get('data-play-by-play-action-id')
          time_remaining = action.find('div').find('span', {'class': 'time'}).text
          full_name = action.find('div').find('span', {'class': 'athlete-name'}).text

          if not action.find('div').find('span', {'class': 'action-code'}):
          result_of_action = '+0'
          else:
          result_of_action = action.find('div').find('span', {'class': 'action-code'}).text

          action_description = action.find('div').find('span', {'class': 'action-description'}).text

          team_A_score = action.find('div').find('span', {'class': 'team-A'}).text
          team_B_score = action.find('div').find('span', {'class': 'team-B'}).text


          temp_df = pd.DataFrame([[action_id, time_remaining, full_name, result_of_action, team_A_score, team_B_score, action_description]],
          columns=['action_id','time_remaining', 'full_name', 'result_of_action', team_name_A+'_score', team_name_B+' score', 'action-description'])

          actions = actions.append(temp_df)


          actions = actions.reset_index(drop=True)


          ###############################################################################

          results = pd.merge(results, players, how='left', on='player_id')
          results = pd.merge(results, teams, how='left', on='team_id')
          results = pd.merge(results, actions, how='left', on='action_id')

          driver.close()


          And to clean it a bit, you can sort the rows so that they are in order, play-by-play from start to finish



          results.sort_values(['period', 'time_remaining'], ascending=[True, False], inplace=True)
          results = results.reset_index(drop=True)





          share|improve this answer























          • OMG!!! What a wonderful answer and job!!! Thank you so much!!! Do you know if it's possible to know the player who made the shot and the time when is made it? Thank you, thank you, thank you!!!
            – José Carlos
            Nov 23 '18 at 13:35










          • I've got a question ... How can you find that these "shot-hit icon icon-point clickable" are the classes to seek? After "soup = bs4.BeautifulSoup(html,'html.parser')" have you print the code and search in them?
            – José Carlos
            Nov 23 '18 at 13:40






          • 1




            you could do that just search through the print (soup), but it's messy. Sometimes I'll just paste it to notepad++ and look. But it's easier I think to right click on the site, and 'Inspect' and click around in there to see how it's structured and what tags they use. There's a bunch of video tutorials out there. I'll admit, it's confusing at first, but practice with it makes it a bit easier...I'm still learning. It might be possible to also grab the player name and time. I'll look through it now.
            – chitown88
            Nov 23 '18 at 13:43












          • Thank you @chitown88 for the help added!!!
            – José Carlos
            Nov 23 '18 at 19:01














          3












          3








          3






          I'm fairly new as well, but learning as I go. It looks like this page is dynamic, so you'd need to use Selenium to load the page first, before grabbing the html with beautifulsoup to get the x and y coordinates from the Made Shots and Missed shots. So I gave it a shot and was able to get a dataframe with the x, y coords along with if it was 'made' or 'miss'.



          I plotted it afterwards just to check to see if it matched, and it appears to be flipped about the x-axis. I believe this is because when you plot on a chart like this graphically, the top, left corner is your (0,0). So your y coordinates are going to be opposite when you want to plot it. I could be wrong though.



          None the less, here's the code I used.



          import pandas as pd
          import bs4
          from selenium import webdriver

          driver = webdriver.Chrome('C:chromedriver_win32chromedriver.exe')
          driver.get('http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart')

          html = driver.page_source
          soup = bs4.BeautifulSoup(html,'html.parser')

          made_shots = soup.findAll("svg", {"class": "shot-hit icon icon-point clickable"})
          missed_shots = soup.findAll("svg", {"class": "shot-miss icon icon-miss clickable"})

          def get_coordiantes(element, label):
          results = pd.DataFrame()
          for point in element:
          x_point = float(point.get('x'))
          y_point = float(point.get('y'))
          marker = label
          temp_df = pd.DataFrame([[x_point, y_point, marker]], columns=['x','y','marker'])
          results = results.append(temp_df)
          return results

          made_results = get_coordiantes(made_shots, 'made')
          missed_results = get_coordiantes(missed_shots, 'missed')

          results = made_results.append(missed_results)
          results = results.reset_index(drop=True)

          results['y'] = results['y'] * -1

          driver.close()


          gives this output:



          In [6]:results.head(5)
          Out[6]:
          x y marker
          0 33.0 -107.0 made
          1 159.0 -160.0 made
          2 143.0 -197.0 made
          3 38.0 -113.0 made
          4 65.0 -130.0 made


          and when I plot it:



          import seaborn as sns
          import numpy as np

          # Add a column: the color depends of x and y values, but you can use whatever function.
          value=(results['marker'] == 'made')
          results['color']= np.where( value==True , "green", "red")

          # plot
          sns.regplot(data=results, x="x", y="y", fit_reg=False, scatter_kws={'facecolors':results['color']})


          enter image description here



          ADDITIONAL: I'm sure there's a better, more efficient, cleaner way to code this up. But just doing it on the fly, came up with this. It should get you going. Feel free to dive into it and look at the html source code to start seeing how it's grabbing the different data. have fun.



          import pandas as pd
          import bs4
          from selenium import webdriver

          driver = webdriver.Chrome('C:chromedriver_win32chromedriver.exe')
          driver.get('http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart')

          html = driver.page_source
          soup = bs4.BeautifulSoup(html,'html.parser')


          ###############################################################################

          shots = soup.findAll("g", {"class": "shot-item"})

          results = pd.DataFrame()
          for point in shots:
          hit = point.get('data-play-by-play-action-hit')
          action_id = point.get('data-play-by-play-action-id')
          period = point.get('data-play-by-play-action-period')
          player_id = point.get('data-play-by-play-action-player-id')
          team_id = point.get('data-play-by-play-action-team-id')

          x_point = float(point.find('svg').get('x'))
          y_point = float(point.find('svg').get('y'))

          temp_df = pd.DataFrame([[hit, action_id, period, player_id, team_id, x_point, y_point]],
          columns=['hit','action_id','period','player_id','team_id','x','y'])
          results = results.append(temp_df)

          results['y'] = results['y'] * -1
          results = results.reset_index(drop=True)



          ###############################################################################

          player_ids = soup.findAll('label', {"class": "item-label"})

          players = pd.DataFrame()
          for player in player_ids:
          player_id = player.find('input').get('data-play-by-play-action-player-id')
          if player_id == None:
          continue

          player_name = player.find('span').text

          temp_df = pd.DataFrame([[player_id, player_name]],
          columns=['player_id','player_name'])

          players = players.append(temp_df)

          players = players.reset_index(drop=True)

          ###############################################################################

          team_ids = soup.findAll('div', {"class": "header-scores_desktop"})
          teams_A = team_ids[0].find('div', {"class": "team-A"})
          team_id_A = teams_A.find('img').get('src').rsplit('/')[-1]
          team_name_A = teams_A.find('span').text
          teams_B = team_ids[0].find('div', {"class": "team-B"})
          team_id_B = teams_B.find('img').get('src').rsplit('/')[-1]
          team_name_B = teams_B.find('span').text

          teams = pd.DataFrame([[team_id_A, team_name_A],[team_id_B,team_name_B]],
          columns=['team_id','team_name'])

          teams = teams.reset_index(drop=True)

          ###############################################################################

          actions = pd.DataFrame()

          action_ids = soup.findAll('div', {"class": "overlay-wrapper"})

          for action in action_ids:
          action_id = action.get('data-play-by-play-action-id')
          time_remaining = action.find('div').find('span', {'class': 'time'}).text
          full_name = action.find('div').find('span', {'class': 'athlete-name'}).text

          if not action.find('div').find('span', {'class': 'action-code'}):
          result_of_action = '+0'
          else:
          result_of_action = action.find('div').find('span', {'class': 'action-code'}).text

          action_description = action.find('div').find('span', {'class': 'action-description'}).text

          team_A_score = action.find('div').find('span', {'class': 'team-A'}).text
          team_B_score = action.find('div').find('span', {'class': 'team-B'}).text


          temp_df = pd.DataFrame([[action_id, time_remaining, full_name, result_of_action, team_A_score, team_B_score, action_description]],
          columns=['action_id','time_remaining', 'full_name', 'result_of_action', team_name_A+'_score', team_name_B+' score', 'action-description'])

          actions = actions.append(temp_df)


          actions = actions.reset_index(drop=True)


          ###############################################################################

          results = pd.merge(results, players, how='left', on='player_id')
          results = pd.merge(results, teams, how='left', on='team_id')
          results = pd.merge(results, actions, how='left', on='action_id')

          driver.close()


          And to clean it a bit, you can sort the rows so that they are in order, play-by-play from start to finish



          results.sort_values(['period', 'time_remaining'], ascending=[True, False], inplace=True)
          results = results.reset_index(drop=True)





          share|improve this answer














          I'm fairly new as well, but learning as I go. It looks like this page is dynamic, so you'd need to use Selenium to load the page first, before grabbing the html with beautifulsoup to get the x and y coordinates from the Made Shots and Missed shots. So I gave it a shot and was able to get a dataframe with the x, y coords along with if it was 'made' or 'miss'.



          I plotted it afterwards just to check to see if it matched, and it appears to be flipped about the x-axis. I believe this is because when you plot on a chart like this graphically, the top, left corner is your (0,0). So your y coordinates are going to be opposite when you want to plot it. I could be wrong though.



          None the less, here's the code I used.



          import pandas as pd
          import bs4
          from selenium import webdriver

          driver = webdriver.Chrome('C:chromedriver_win32chromedriver.exe')
          driver.get('http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart')

          html = driver.page_source
          soup = bs4.BeautifulSoup(html,'html.parser')

          made_shots = soup.findAll("svg", {"class": "shot-hit icon icon-point clickable"})
          missed_shots = soup.findAll("svg", {"class": "shot-miss icon icon-miss clickable"})

          def get_coordiantes(element, label):
          results = pd.DataFrame()
          for point in element:
          x_point = float(point.get('x'))
          y_point = float(point.get('y'))
          marker = label
          temp_df = pd.DataFrame([[x_point, y_point, marker]], columns=['x','y','marker'])
          results = results.append(temp_df)
          return results

          made_results = get_coordiantes(made_shots, 'made')
          missed_results = get_coordiantes(missed_shots, 'missed')

          results = made_results.append(missed_results)
          results = results.reset_index(drop=True)

          results['y'] = results['y'] * -1

          driver.close()


          gives this output:



          In [6]:results.head(5)
          Out[6]:
          x y marker
          0 33.0 -107.0 made
          1 159.0 -160.0 made
          2 143.0 -197.0 made
          3 38.0 -113.0 made
          4 65.0 -130.0 made


          and when I plot it:



          import seaborn as sns
          import numpy as np

          # Add a column: the color depends of x and y values, but you can use whatever function.
          value=(results['marker'] == 'made')
          results['color']= np.where( value==True , "green", "red")

          # plot
          sns.regplot(data=results, x="x", y="y", fit_reg=False, scatter_kws={'facecolors':results['color']})


          enter image description here



          ADDITIONAL: I'm sure there's a better, more efficient, cleaner way to code this up. But just doing it on the fly, came up with this. It should get you going. Feel free to dive into it and look at the html source code to start seeing how it's grabbing the different data. have fun.



          import pandas as pd
          import bs4
          from selenium import webdriver

          driver = webdriver.Chrome('C:chromedriver_win32chromedriver.exe')
          driver.get('http://www.fiba.basketball/euroleaguewomen/18-19/game/2410/Nadezhda-ZVVZ-USK-Praha#|tab=shot_chart')

          html = driver.page_source
          soup = bs4.BeautifulSoup(html,'html.parser')


          ###############################################################################

          shots = soup.findAll("g", {"class": "shot-item"})

          results = pd.DataFrame()
          for point in shots:
          hit = point.get('data-play-by-play-action-hit')
          action_id = point.get('data-play-by-play-action-id')
          period = point.get('data-play-by-play-action-period')
          player_id = point.get('data-play-by-play-action-player-id')
          team_id = point.get('data-play-by-play-action-team-id')

          x_point = float(point.find('svg').get('x'))
          y_point = float(point.find('svg').get('y'))

          temp_df = pd.DataFrame([[hit, action_id, period, player_id, team_id, x_point, y_point]],
          columns=['hit','action_id','period','player_id','team_id','x','y'])
          results = results.append(temp_df)

          results['y'] = results['y'] * -1
          results = results.reset_index(drop=True)



          ###############################################################################

          player_ids = soup.findAll('label', {"class": "item-label"})

          players = pd.DataFrame()
          for player in player_ids:
          player_id = player.find('input').get('data-play-by-play-action-player-id')
          if player_id == None:
          continue

          player_name = player.find('span').text

          temp_df = pd.DataFrame([[player_id, player_name]],
          columns=['player_id','player_name'])

          players = players.append(temp_df)

          players = players.reset_index(drop=True)

          ###############################################################################

          team_ids = soup.findAll('div', {"class": "header-scores_desktop"})
          teams_A = team_ids[0].find('div', {"class": "team-A"})
          team_id_A = teams_A.find('img').get('src').rsplit('/')[-1]
          team_name_A = teams_A.find('span').text
          teams_B = team_ids[0].find('div', {"class": "team-B"})
          team_id_B = teams_B.find('img').get('src').rsplit('/')[-1]
          team_name_B = teams_B.find('span').text

          teams = pd.DataFrame([[team_id_A, team_name_A],[team_id_B,team_name_B]],
          columns=['team_id','team_name'])

          teams = teams.reset_index(drop=True)

          ###############################################################################

          actions = pd.DataFrame()

          action_ids = soup.findAll('div', {"class": "overlay-wrapper"})

          for action in action_ids:
          action_id = action.get('data-play-by-play-action-id')
          time_remaining = action.find('div').find('span', {'class': 'time'}).text
          full_name = action.find('div').find('span', {'class': 'athlete-name'}).text

          if not action.find('div').find('span', {'class': 'action-code'}):
          result_of_action = '+0'
          else:
          result_of_action = action.find('div').find('span', {'class': 'action-code'}).text

          action_description = action.find('div').find('span', {'class': 'action-description'}).text

          team_A_score = action.find('div').find('span', {'class': 'team-A'}).text
          team_B_score = action.find('div').find('span', {'class': 'team-B'}).text


          temp_df = pd.DataFrame([[action_id, time_remaining, full_name, result_of_action, team_A_score, team_B_score, action_description]],
          columns=['action_id','time_remaining', 'full_name', 'result_of_action', team_name_A+'_score', team_name_B+' score', 'action-description'])

          actions = actions.append(temp_df)


          actions = actions.reset_index(drop=True)


          ###############################################################################

          results = pd.merge(results, players, how='left', on='player_id')
          results = pd.merge(results, teams, how='left', on='team_id')
          results = pd.merge(results, actions, how='left', on='action_id')

          driver.close()


          And to clean it a bit, you can sort the rows so that they are in order, play-by-play from start to finish



          results.sort_values(['period', 'time_remaining'], ascending=[True, False], inplace=True)
          results = results.reset_index(drop=True)






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 24 '18 at 11:45

























          answered Nov 23 '18 at 12:41









          chitown88

          1,6741314




          1,6741314












          • OMG!!! What a wonderful answer and job!!! Thank you so much!!! Do you know if it's possible to know the player who made the shot and the time when is made it? Thank you, thank you, thank you!!!
            – José Carlos
            Nov 23 '18 at 13:35










          • I've got a question ... How can you find that these "shot-hit icon icon-point clickable" are the classes to seek? After "soup = bs4.BeautifulSoup(html,'html.parser')" have you print the code and search in them?
            – José Carlos
            Nov 23 '18 at 13:40






          • 1




            you could do that just search through the print (soup), but it's messy. Sometimes I'll just paste it to notepad++ and look. But it's easier I think to right click on the site, and 'Inspect' and click around in there to see how it's structured and what tags they use. There's a bunch of video tutorials out there. I'll admit, it's confusing at first, but practice with it makes it a bit easier...I'm still learning. It might be possible to also grab the player name and time. I'll look through it now.
            – chitown88
            Nov 23 '18 at 13:43












          • Thank you @chitown88 for the help added!!!
            – José Carlos
            Nov 23 '18 at 19:01


















          • OMG!!! What a wonderful answer and job!!! Thank you so much!!! Do you know if it's possible to know the player who made the shot and the time when is made it? Thank you, thank you, thank you!!!
            – José Carlos
            Nov 23 '18 at 13:35










          • I've got a question ... How can you find that these "shot-hit icon icon-point clickable" are the classes to seek? After "soup = bs4.BeautifulSoup(html,'html.parser')" have you print the code and search in them?
            – José Carlos
            Nov 23 '18 at 13:40






          • 1




            you could do that just search through the print (soup), but it's messy. Sometimes I'll just paste it to notepad++ and look. But it's easier I think to right click on the site, and 'Inspect' and click around in there to see how it's structured and what tags they use. There's a bunch of video tutorials out there. I'll admit, it's confusing at first, but practice with it makes it a bit easier...I'm still learning. It might be possible to also grab the player name and time. I'll look through it now.
            – chitown88
            Nov 23 '18 at 13:43












          • Thank you @chitown88 for the help added!!!
            – José Carlos
            Nov 23 '18 at 19:01
















          OMG!!! What a wonderful answer and job!!! Thank you so much!!! Do you know if it's possible to know the player who made the shot and the time when is made it? Thank you, thank you, thank you!!!
          – José Carlos
          Nov 23 '18 at 13:35




          OMG!!! What a wonderful answer and job!!! Thank you so much!!! Do you know if it's possible to know the player who made the shot and the time when is made it? Thank you, thank you, thank you!!!
          – José Carlos
          Nov 23 '18 at 13:35












          I've got a question ... How can you find that these "shot-hit icon icon-point clickable" are the classes to seek? After "soup = bs4.BeautifulSoup(html,'html.parser')" have you print the code and search in them?
          – José Carlos
          Nov 23 '18 at 13:40




          I've got a question ... How can you find that these "shot-hit icon icon-point clickable" are the classes to seek? After "soup = bs4.BeautifulSoup(html,'html.parser')" have you print the code and search in them?
          – José Carlos
          Nov 23 '18 at 13:40




          1




          1




          you could do that just search through the print (soup), but it's messy. Sometimes I'll just paste it to notepad++ and look. But it's easier I think to right click on the site, and 'Inspect' and click around in there to see how it's structured and what tags they use. There's a bunch of video tutorials out there. I'll admit, it's confusing at first, but practice with it makes it a bit easier...I'm still learning. It might be possible to also grab the player name and time. I'll look through it now.
          – chitown88
          Nov 23 '18 at 13:43






          you could do that just search through the print (soup), but it's messy. Sometimes I'll just paste it to notepad++ and look. But it's easier I think to right click on the site, and 'Inspect' and click around in there to see how it's structured and what tags they use. There's a bunch of video tutorials out there. I'll admit, it's confusing at first, but practice with it makes it a bit easier...I'm still learning. It might be possible to also grab the player name and time. I'll look through it now.
          – chitown88
          Nov 23 '18 at 13:43














          Thank you @chitown88 for the help added!!!
          – José Carlos
          Nov 23 '18 at 19:01




          Thank you @chitown88 for the help added!!!
          – José Carlos
          Nov 23 '18 at 19:01


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53439922%2fweb-scraping-how-to-get-info-from-dynamic-pages%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          What visual should I use to simply compare current year value vs last year in Power BI desktop

          How to ignore python UserWarning in pytest?

          Alexandru Averescu